AITopics | fisher information matrix

Collaborating Authors

fisher information matrix

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

021e1ea77bd91aaa0fc4d01a943a654e-AuthorFeedback.pdf

Neural Information Processing SystemsApr-30-2026, 19:57:30 GMT

algorithm 1, artificial intelligence, gaussian distribution, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

An Improved Empirical Fisher Approximation for Natural Gradient Descent

Neural Information Processing SystemsMar-22-2026, 20:44:36 GMT

Approximate Natural Gradient Descent (NGD) methods are an important family of optimisers for deep learning models, which use approximate Fisher information matrices to pre-condition gradients during training. The empirical Fisher (EF) method approximates the Fisher information matrix empirically by reusing the per-sample gradients collected during back-propagation. Despite its ease of implementation, the EF approximation has its theoretical and practical limitations. This paper investigates the issue of EF, which is shown to be a major cause of its poor empirical approximation quality. An improved empirical Fisher (iEF) method is proposed to address this issue, which is motivated as a generalised NGD method from a loss reduction perspective, meanwhile retaining the practical convenience of EF.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Add feedback

71ec377d5df1fc61ee7770857820519b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 06:22:26 GMT

machine learning, reinforcement learning, uncertainty estimation, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(2 more...)

Genre: Research Report (0.46)

Industry:

Health & Medicine > Diagnostic Medicine (0.47)
Health & Medicine > Surgery (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Add feedback

021e1ea77bd91aaa0fc4d01a943a654e-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 07:52:16 GMT

algorithm 1, gaussian distribution, regularized problem, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

31fb284a0aaaad837d2930a610cd5e50-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 20:19:06 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > Middle East > Lebanon (0.04)
Asia > China > Shandong Province > Qingdao (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AnImprovedAnalysisof(Variance-Reduced) Policy GradientandNaturalPolicyGradientMethods

Neural Information Processing SystemsFeb-8-2026, 11:46:44 GMT

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations.

artificial intelligence, arxivpreprintarxiv, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

An approach to Fisher-Rao metric for infinite dimensional non-parametric information geometry

Cheng, Bing, Tong, Howell

arXiv.org Machine LearningJan-8-2026

Being infinite dimensional, non-parametric information geometry has long faced an "intractability barrier" due to the fact that the Fisher-Rao metric is now a functional incurring difficulties in defining its inverse. This paper introduces a novel framework to resolve the intractability with an Orthogonal Decomposition of the Tangent Space ($T_fM = S \oplus S^{\perp}$), where $S$ represents an observable covariate subspace. Through the decomposition, we derive the Covariate Fisher Information Matrix (cFIM), denoted as ${\bf G}_f$, which is a finite-dimensional and computable representative of information extractable from the manifold's geometry. Significantly, by proving the Trace Theorem: $H_G(f) = \text{Tr}({\bf G}_f)$, we establish a rigorous foundation for the G-entropy previously introduced by us, thereby identifying it as a fundamental geometric invariant representing the total explainable statistical information captured by the probability distribution associated with a model. Furthermore, we establish a link between ${\bf G}_f$ and the second derivative (i.e. the curvature) of the KL-divergence, leading to the notion of Covariate Cramér-Rao Lower Bound(CRLB). We demonstrate that ${\bf G}_f$ is congruent to the Efficient Fisher Information Matrix, thereby providing fundamental limits of variance for semi-parametric estimators. Finally, we apply our geometric framework to the Manifold Hypothesis, lifting the latter from a heuristic assumption into a testable condition of rank-deficiency within the cFIM. By defining the Information Capture Ratio, we provide a rigorous method for estimating intrinsic dimensionality in high-dimensional data. In short, our work bridges the gap between abstract information geometry and the demand of explainable AI, by providing a tractable path for assessing the statistical coverage and the efficiency of non-parametric models.

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Machine Learning

2512.21451

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

Neural Information Processing SystemsDec-25-2025, 05:21:55 GMT

We provide quantitative bounds measuring the $L^2$ difference in function space between the trajectory of a finite-width network trained on finitely many samples from the idealized kernel dynamics of infinite width and infinite data. An implication of the bounds is that the network is biased to learn the top eigenfunctions of the Neural Tangent Kernel not just on the training set but over the entire input space. This bias depends on the model architecture and input distribution alone and thus does not depend on the target function which does not need to be in the RKHS of the kernel. The result is valid for deep architectures with fully connected, convolutional, and residual layers. Furthermore the width does not need to grow polynomially with the number of samples in order to obtain high probability bounds up to a stopping time. The proof exploits the low-effective-rank property of the Fisher Information Matrix at initialization, which implies a low effective dimension of the model (far smaller than the number of parameters). We conclude that local capacity control from the low effective rank of the Fisher Information Matrix is still underexplored theoretically.

deep network, spectral bias, training set, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

A Layer-Wise Natural Gradient Optimizer for Training Deep Neural Networks

Neural Information Processing SystemsDec-24-2025, 20:47:28 GMT

Second-order optimization algorithms, such as the Newton method and the natural gradient descent (NGD) method exhibit excellent convergence properties for training deep neural networks, but the high computational cost limits its practical application. In this paper, we focus on the NGD method and propose a novel layer-wise natural gradient descent (LNGD) method to further reduce computational costs and accelerate the training process. Specifically, based on the block diagonal approximation of the Fisher information matrix, we first propose the layer-wise sample method to compute each block matrix without performing a complete back-propagation. Then, each block matrix is approximated as a Kronecker product of two smaller matrices, one of which is a diagonal matrix, while keeping the traces equal before and after approximation. By these two steps, we provide a new approximation for the Fisher information matrix, which can effectively reduce the computational cost while preserving the main information of each block matrix. Moreover, we propose a new adaptive layer-wise learning rate to further accelerate training. Based on these new approaches, we propose the LNGD optimizer. The global convergence analysis of LNGD is established under some assumptions. Experiments on image classification and machine translation tasks show that our method is quite competitive compared to the state-of-the-art methods.

artificial intelligence, layer-wise natural gradient optimizer, machine learning, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Filters

Collaborating Authors

fisher information matrix

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

021e1ea77bd91aaa0fc4d01a943a654e-AuthorFeedback.pdf

An Improved Empirical Fisher Approximation for Natural Gradient Descent

71ec377d5df1fc61ee7770857820519b-Paper-Conference.pdf

021e1ea77bd91aaa0fc4d01a943a654e-AuthorFeedback.pdf

31fb284a0aaaad837d2930a610cd5e50-Paper-Conference.pdf

56577889b3c1cd083b6d7b32d32f99d5-Supplemental.pdf

AnImprovedAnalysisof(Variance-Reduced) Policy GradientandNaturalPolicyGradientMethods

An approach to Fisher-Rao metric for infinite dimensional non-parametric information geometry

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

A Layer-Wise Natural Gradient Optimizer for Training Deep Neural Networks